Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
IEEE Trans Image Process ; 33: 2158-2170, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38470575

RESUMO

Depth information opens up new opportunities for video object segmentation (VOS) to be more accurate and robust in complex scenes. However, the RGBD VOS task is largely unexplored due to the expensive collection of RGBD data and time-consuming annotation of segmentation. In this work, we first introduce a new benchmark for RGBD VOS, named DepthVOS, which contains 350 videos (over 55k frames in total) annotated with masks and bounding boxes. We futher propose a novel, strong baseline model - Fused Color-Depth Network (FusedCDNet), which can be trained solely under the supervision of bounding boxes, while being used to generate masks with a bounding box guideline only in the first frame. Thereby, the model possesses three major advantages: a weakly-supervised training strategy to overcome the high-cost annotation, a cross-modal fusion module to handle complex scenes, and weakly-supervised inference to promote ease of use. Extensive experiments demonstrate that our proposed method performs on par with top fully-supervised algorithms. We will open-source our project on https://github.com/yjybuaa/depthvos/ to facilitate the development of RGBD VOS.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 1594-1605, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35298375

RESUMO

Recurrent models are a popular choice for video enhancement tasks such as video denoising or super-resolution. In this work, we focus on their stability as dynamical systems and show that they tend to fail catastrophically at inference time on long video sequences. To address this issue, we (1) introduce a diagnostic tool which produces input sequences optimized to trigger instabilities and that can be interpreted as visualizations of temporal receptive fields, and (2) propose two approaches to enforce the stability of a model during training: constraining the spectral norm or constraining the stable rank of its convolutional layers. We then introduce Stable Rank Normalization for Convolutional layers (SRN-C), a new algorithm that enforces these constraints. Our experimental results suggest that SRN-C successfully enforces stablility in recurrent video processing models without a significant performance loss.

3.
IEEE Trans Image Process ; 31: 5923-5935, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36074867

RESUMO

High dynamic range (HDR) imaging is of fundamental importance in modern digital photography pipelines and used to produce a high-quality photograph with well exposed regions despite varying illumination across the image. This is typically achieved by merging multiple low dynamic range (LDR) images taken at different exposures. However, over-exposed regions and misalignment errors due to poorly compensated motion result in artefacts such as ghosting. In this paper, we present a new HDR imaging technique that specifically models alignment and exposure uncertainties to produce high quality HDR results. We introduce a strategy that learns to jointly align and assess the alignment and exposure reliability using an HDR-aware, uncertainty-driven attention map that robustly merges the frames into a single high quality HDR image. Further, we introduce a progressive, multi-stage image fusion approach that can flexibly merge any number of LDR images in a permutation-invariant manner. Experimental results show our method can produce better quality HDR images with up to 1.1dB PSNR improvement to the state-of-the-art, and subjective improvements in terms of better detail, colours, and fewer artefacts.

5.
Vision Res ; 197: 108058, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35487146

RESUMO

In this paper we consider recent advances in the use of deep convolutional neural networks to understanding biological vision. We focus on claims about the plausibility of feedforward deep convolutional neural networks (fDCNNs) as models of image classification in the biological system. Despite the putative similarity of these networks to some properties of the biological vision system, and the remarkable levels of performance accuracy of some fDCNNs, we argue that their plausibility as a framework for understanding image classification remains unclear. We highlight two key issues that we suggest are relevant to the evaluation of any form of DNN used to examine biological vision: (1) Network transparency under analysis - that is, the challenge of understanding what networks do, and how they do it. (2) Identifying appropriate benchmarks for comparing network performance and the biological system using both quantitative and qualitative performance measures. We show that there are important divergences between fDCNNs and biological vision that reflect fundamental differences in computational architectures, and representational structures, supporting image classification in these networks and the biological system.


Assuntos
Redes Neurais de Computação , Humanos
6.
IEEE Trans Med Imaging ; 41(1): 199-212, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34460369

RESUMO

Data-driven deep learning approaches to image registration can be less accurate than conventional iterative approaches, especially when training data is limited. To address this issue and meanwhile retain the fast inference speed of deep learning, we propose VR-Net, a novel cascaded variational network for unsupervised deformable image registration. Using a variable splitting optimization scheme, we first convert the image registration problem, established in a generic variational framework, into two sub-problems, one with a point-wise, closed-form solution and the other one being a denoising problem. We then propose two neural layers (i.e. warping layer and intensity consistency layer) to model the analytical solution and a residual U-Net (termed generalized denoising layer) to formulate the denoising problem. Finally, we cascade the three neural layers multiple times to form our VR-Net. Extensive experiments on three (two 2D and one 3D) cardiac magnetic resonance imaging datasets show that VR-Net outperforms state-of-the-art deep learning methods on registration accuracy, whilst maintaining the fast inference speed of deep learning and the data-efficiency of variational models.


Assuntos
Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética
7.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3366-3385, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-33544669

RESUMO

Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern: (1) a taxonomy and extensive overview of the state-of-the-art; (2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner; (3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods; and (4) baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage.


Assuntos
Algoritmos , Aprendizagem , Redes Neurais de Computação
8.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 7705-7717, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34559636

RESUMO

Image demoireing is a multi-faceted image restoration task involving both moire pattern removal and color restoration. In this paper, we raise a general degradation model to describe an image contaminated by moire patterns, and propose a novel multi-scale bandpass convolutional neural network (MBCNN) for single image demoireing. For moire pattern removal, we propose a multi-block-size learnable bandpass filters (M-LBFs), based on a block-wise frequency domain transform, to learn the frequency domain priors of moire patterns. We also introduce a new loss function named Dilated Advanced Sobel loss (D-ASL) to better sense the frequency information. For color restoration, we propose a two-step tone mapping strategy, which first applies a global tone mapping to correct for a global color shift, and then performs local fine tuning of the color per pixel. To determine the most appropriate frequency domain transform, we investigate several transforms including DCT, DFT, DWT, learnable non-linear transform and learnable orthogonal transform. We finally adopt the DCT. Our basic model won the AIM2019 demoireing challenge. Experimental results on three public datasets show that our method outperforms state-of-the-art methods by a large margin.

9.
Sensors (Basel) ; 20(17)2020 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-32825013

RESUMO

This work examines the differences between a human and a machine in object recognition tasks. The machine is useful as much as the output classification labels are correct and match the dataset-provided labels. However, very often a discrepancy occurs because the dataset label is different than the one expected by a human. To correct this, the concept of the target user population is introduced. The paper presents a complete methodology for either adapting the output of a pre-trained, state-of-the-art object classification algorithm to the target population or inferring a proper, user-friendly categorization from the target population. The process is called 'user population re-targeting'. The methodology includes a set of specially designed population tests, which provide crucial data about the categorization that the target population prefers. The transformation between the dataset-bound categorization and the new, population-specific categorization is called the 'Cognitive Relevance Transform'. The results of the experiments on the well-known datasets have shown that the target population preferred such a transformed categorization by a large margin, that the performance of human observers is probably better than previously thought, and that the outcome of re-targeting may be difficult to predict without actual tests on the target population.


Assuntos
Algoritmos , Cognição , Percepção Visual , Humanos
10.
IEEE Trans Cybern ; 48(8): 2485-2499, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28885166

RESUMO

This paper presents a novel robust method for single target tracking in RGB-D images, and also contributes a substantial new benchmark dataset for evaluating RGB-D trackers. While a target object's color distribution is reasonably motion-invariant, this is not true for the target's depth distribution, which continually varies as the target moves relative to the camera. It is therefore nontrivial to design target models which can fully exploit (potentially very rich) depth information for target tracking. For this reason, much of the previous RGB-D literature relies on color information for tracking, while exploiting depth information only for occlusion reasoning. In contrast, we propose an adaptive range-invariant target depth model, and show how both depth and color information can be fully and adaptively fused during the search for the target in each new RGB-D image. We introduce a new, hierarchical, two-layered target model (comprising local and global models) which uses spatio-temporal consistency constraints to achieve stable and robust on-the-fly target relearning. In the global layer, multiple features, derived from both color and depth data, are adaptively fused to find a candidate target region. In ambiguous frames, where one or more features disagree, this global candidate region is further decomposed into smaller local candidate regions for matching to local-layer models of small target parts. We also note that conventional use of depth data, for occlusion reasoning, can easily trigger false occlusion detections when the target moves rapidly toward the camera. To overcome this problem, we show how combining target information with contextual information enables the target's depth constraint to be relaxed. Our adaptively relaxed depth constraints can robustly accommodate large and rapid target motion in the depth direction, while still enabling the use of depth data for highly accurate reasoning about occlusions. For evaluation, we introduce a new RGB-D benchmark dataset with per-frame annotated attributes and extensive bias analysis. Our tracker is evaluated using two different state-of-the-art methodologies, VOT and object tracking benchmark, and in both cases it significantly outperforms four other state-of-the-art RGB-D trackers from the literature.

11.
PLoS One ; 12(1): e0169411, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28046074

RESUMO

The paper presents a new compositional hierarchical model for robust music transcription. Its main features are unsupervised learning of a hierarchical representation of input data, transparency, which enables insights into the learned representation, as well as robustness and speed which make it suitable for real-world and real-time use. The model consists of multiple layers, each composed of a number of parts. The hierarchical nature of the model corresponds well to hierarchical structures in music. The parts in lower layers correspond to low-level concepts (e.g. tone partials), while the parts in higher layers combine lower-level representations into more complex concepts (tones, chords). The layers are learned in an unsupervised manner from music signals. Parts in each layer are compositions of parts from previous layers based on statistical co-occurrences as the driving force of the learning process. In the paper, we present the model's structure and compare it to other hierarchical approaches in the field of music information retrieval. We evaluate the model's performance for the multiple fundamental frequency estimation. Finally, we elaborate on extensions of the model towards other music information retrieval tasks.


Assuntos
Aprendizagem , Música , Percepção da Altura Sonora , Software , Algoritmos , Humanos , Modelos Estatísticos , Modelos Teóricos , Percepção
12.
IEEE Trans Image Process ; 25(3): 1261-74, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26812723

RESUMO

The problem of visual tracking evaluation is sporting a large variety of performance measures, and largely suffers from lack of consensus about which measures should be used in experiments. This makes the cross-paper tracker comparison difficult. Furthermore, as some measures may be less effective than others, the tracking results may be skewed or biased toward particular tracking aspects. In this paper, we revisit the popular performance measures and tracker performance visualizations and analyze them theoretically and experimentally. We show that several measures are equivalent from the point of information they provide for tracker comparison and, crucially, that some are more brittle than the others. Based on our analysis, we narrow down the set of potential measures to only two complementary ones, describing accuracy and robustness, thus pushing toward homogenization of the tracker evaluation methodology. These two measures can be intuitively interpreted and visualized and have been employed by the recent visual object tracking challenges as the foundation for the evaluation methodology.

13.
IEEE Trans Pattern Anal Mach Intell ; 38(11): 2137-2155, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-26766217

RESUMO

This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed.

15.
IEEE Trans Cybern ; 44(3): 355-65, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23757555

RESUMO

We propose a new method for a supervised online estimation of probabilistic discriminative models for classification tasks. The method estimates the class distributions from a stream of data in the form of Gaussian mixture models (GMMs). The reconstructive updates of the distributions are based on the recently proposed online kernel density estimator (oKDE). We maintain the number of components in the model low by compressing the GMMs from time to time. We propose a new cost function that measures loss of interclass discrimination during compression, thus guiding the compression toward simpler models that still retain discriminative properties. The resulting classifier thus independently updates the GMM of each class, but these GMMs interact during their compression through the proposed cost function. We call the proposed method the online discriminative kernel density estimator (odKDE). We compare the odKDE to oKDE, batch state-of-the-art kernel density estimators (KDEs), and batch/incremental support vector machines (SVM) on the publicly available datasets. The odKDE achieves comparable classification performance to that of best batch KDEs and SVM, while allowing online adaptation from large datasets, and produces models of lower complexity than the oKDE.

16.
IEEE Trans Pattern Anal Mach Intell ; 35(8): 1847-71, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23787340

RESUMO

Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Organized for a computer vision audience, we present functional principles of the processing hierarchies present in the primate visual system considering recent discoveries in neurophysiology. The hierarchical processing in the primate visual system is characterized by a sequence of different levels of processing (on the order of 10) that constitute a deep hierarchy in contrast to the flat vision architectures predominantly used in today's mainstream computer vision. We hope that the functional description of the deep hierarchies realized in the primate visual system provides valuable insights for the design of computer vision algorithms, fostering increasingly productive interaction between biological and computer vision research.


Assuntos
Inteligência Artificial , Visão Ocular/fisiologia , Córtex Visual/fisiologia , Animais , Humanos , Reconhecimento Visual de Modelos/fisiologia , Primatas
17.
IEEE Trans Pattern Anal Mach Intell ; 35(4): 941-53, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22802114

RESUMO

This paper addresses the problem of tracking objects which undergo rapid and significant appearance changes. We propose a novel coupled-layer visual model that combines the target's global and local appearance by interlacing two layers. The local layer in this model is a set of local patches that geometrically constrain the changes in the target's appearance. This layer probabilistically adapts to the target's geometric deformation, while its structure is updated by removing and adding the local patches. The addition of these patches is constrained by the global layer that probabilistically models the target's global visual properties, such as color, shape, and apparent local motion. The global visual properties are updated during tracking using the stable patches from the local layer. By this coupled constraint paradigm between the adaptation of the global and the local layer, we achieve a more robust tracking through significant appearance changes. We experimentally compare our tracker to 11 state-of-the-art trackers. The experimental results on challenging sequences confirm that our tracker outperforms the related trackers in many cases by having a smaller failure rate as well as better accuracy. Furthermore, the parameter analysis shows that our tracker is stable over a range of parameter values.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Humanos , Modelos Teóricos , Movimento , Gravação em Vídeo
18.
IEEE Trans Syst Man Cybern B Cybern ; 40(6): 1505-20, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20215085

RESUMO

We propose a new dynamic model which can be used within blob trackers to track the target's center of gravity. A strong point of the model is that it is designed to track a variety of motions which are usually encountered in applications such as pedestrian tracking, hand tracking, and sports. We call the dynamic model a two-stage dynamic model due to its particular structure, which is a composition of two models: a liberal model and a conservative model. The liberal model allows larger perturbations in the target's dynamics and is able to account for motions in between the random-walk dynamics and the nearly constant-velocity dynamics. On the other hand, the conservative model assumes smaller perturbations and is used to further constrain the liberal model to the target's current dynamics. We implement the two-stage dynamic model in a two-stage probabilistic tracker based on the particle filter and apply it to two separate examples of blob tracking: 1) tracking entire persons and 2) tracking of a person's hands. Experiments show that, in comparison to the widely used models, the proposed two-stage dynamic model allows tracking with smaller number of particles in the particle filter (e.g., 25 particles), while achieving smaller errors in the state estimation and a smaller failure rate. The results suggest that the improved performance comes from the model's ability to actively adapt to the target's motion during tracking.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Modelos Teóricos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador , Movimento (Física)
19.
IEEE Trans Pattern Anal Mach Intell ; 28(3): 337-50, 2006 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-16526421

RESUMO

Linear subspace methods that provide sufficient reconstruction of the data, such as PCA, offer an efficient way of dealing with missing pixels, outliers, and occlusions that often appear in the visual data. Discriminative methods, such as LDA, which, on the other hand, are better suited for classification tasks, are highly sensitive to corrupted data. We present a theoretical framework for achieving the best of both types of methods: An approach that combines the discrimination power of discriminative methods with the reconstruction property of reconstructive methods which enables one to work on subsets of pixels in images to efficiently detect and reject the outliers. The proposed approach is therefore capable of robust classification with a high-breakdown point. We also show that subspace methods, such as CCA, which are used for solving regression tasks, can be treated in a similar manner. The theoretical results are demonstrated on several computer vision tasks showing that the proposed approach significantly outperforms the standard discriminative methods in the case of missing pixels and images containing occlusions and outliers.


Assuntos
Algoritmos , Inteligência Artificial , Face/anatomia & histologia , Interpretação de Imagem Assistida por Computador/métodos , Modelos Biológicos , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Análise por Conglomerados , Simulação por Computador , Análise Discriminante , Humanos , Aumento da Imagem/métodos , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Análise de Componente Principal , Análise de Regressão , Reprodutibilidade dos Testes , Tamanho da Amostra , Sensibilidade e Especificidade
20.
IEEE Trans Image Process ; 12(7): 817-25, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-18237956

RESUMO

In this paper, we propose a novel method for efficiently calculating the eigenvectors of uniformly rotated images of a set of templates. As we show, the images can be optimally approximated by a linear series of eigenvectors which can be calculated without actually decomposing the sample covariance matrix.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...